fast adversarial robustness certification
Fast Adversarial Robustness Certification of Nearest Prototype Classifiers for Arbitrary Seminorms
Methods for adversarial robustness certification aim to provide an upper bound on the test error of a classifier under adversarial manipulation of its input. Current certification methods are computationally expensive and limited to attacks that optimize the manipulation with respect to a norm. We overcome these limitations by investigating the robustness properties of Nearest Prototype Classifiers (NPCs) like learning vector quantization and large margin nearest neighbor. For this purpose, we study the hypothesis margin. We prove that if NPCs use a dissimilarity measure induced by a seminorm, the hypothesis margin is a tight lower bound on the size of adversarial attacks and can be calculated in constant time--this provides the first adversarial robustness certificate calculable in reasonable time. Finally, we show that each NPC trained by a triplet loss maximizes the hypothesis margin and is therefore optimized for adversarial robustness. In the presented evaluation, we demonstrate that NPCs optimized for adversarial robustness are competitive with state-of-the-art methods and set a new benchmark with respect to computational complexity for robustness certification.
Review for NeurIPS paper: Fast Adversarial Robustness Certification of Nearest Prototype Classifiers for Arbitrary Seminorms
Additional Feedback: Overall this paper is well presented and technically sound. However, I believe its technical contribution is minor and it does not have significant impact to this field. Thus I vote for a weak reject. To increase the contribution of this paper, the authors can consider designing training algorithms that improves the provable robustness of NPCs. For example, RSLVQ is a strong method (in Table 1 it achieves very competitive clean test error); can we improve its robustness to the same level of other baselines?
Fast Adversarial Robustness Certification of Nearest Prototype Classifiers for Arbitrary Seminorms
Methods for adversarial robustness certification aim to provide an upper bound on the test error of a classifier under adversarial manipulation of its input. Current certification methods are computationally expensive and limited to attacks that optimize the manipulation with respect to a norm. We overcome these limitations by investigating the robustness properties of Nearest Prototype Classifiers (NPCs) like learning vector quantization and large margin nearest neighbor. For this purpose, we study the hypothesis margin. We prove that if NPCs use a dissimilarity measure induced by a seminorm, the hypothesis margin is a tight lower bound on the size of adversarial attacks and can be calculated in constant time--this provides the first adversarial robustness certificate calculable in reasonable time.